The datasets
Data processing
Network architecture
Summary and Conclusions
what is new?
- the use of the transformer
(the same that powers ChatGPT and all recent AI models)
- integrating it with fMRI data in a series of tasks
how did they do it?
transformer
model
age
gender
shizofrenia
mean
voxel
stdev
voxel
global
normalization
number of voxels
mean
voxel
2
2
number of voxels
stdev
voxel
voxel
normalization
mean
volume
stdev
volume
mean
volume
number of volumes
2
2
number of volumes
stdev
volume
z-scores
global
normalization
Encoder
Decoder
z-scores
global
normalization
positional encoding
token type embedding
adding the CLS token
layer normalization
dropout
GELU(linear(x))
hidden_state
normalization(dropout(linear(hidden_state))+x)
x
x
last_hidden_state
pooler_output
this is the processed CLS token
in the fmri paper
they go to bottleneck out
0
1
2
29
28
27
2639
2638
2637
2
1
0
2640 elements
hidden_size
max_position_embeddings
linear
linear
linear
Q
K
V
QK
T
softmax
d
k
V
linear
dropout
layer norm
+
linear
GELU
linear
dropout
layer norm
Self attention
2640->3072
+
3072->2640
BERT layer
positional
embedding
BERT
layer
BERT
layer
21x2640
21x2640
first one is CLS
first one is
processed CLS
Bidirectional
Encoder
Representations from
Transformers
Encoder
z-scores
global
normalization
bottle
neck in
QUESTIONS
thank you
BERT
Key Innovations
Novel application of transformers to fMRI data analysis
Three-phase training approach: autoencoder, transformer pre-training, and task-specific
fine-tuning
Effective combination of CNNs and transformers for spatiotemporal fMRI processing
Major Findings
State-of-the-art performance on multiple fMRI prediction tasks:
Age prediction: L1 error of 2.73 years
Gender classification: 94.09% accuracy
Schizophrenia detection: Up to 88.2% accuracy (CNP dataset)
Strengths
Versatility across different prediction tasks and datasets
Ability to capture both spatial and temporal patterns in fMRI data
Effective use of self-supervised pre-training on large fMRI datasets
Limitations and Future Directions
Limited exploration of sequence length and stride parameters
Potential for further optimization of model architecture
Opportunity for more extensive ablation studies
Implications
New possibilities for advanced fMRI analysis in neuroscience and clinical applications
Potential for improved understanding of brain function and neurological disorders
Framework for applying transformer models to other types of medical imaging data
Conclusion
TFF demonstrates the power of adapting advanced AI techniques to neuroimaging,
opening new avenues for brain research and clinical diagnostics.
Phase 1: autoencoder pre-training
Phase 2: transformer pre-training
Phase 3: fine-tunning
age?
gender?
shizofrenia?
z-scores
bottle
neck in
bottle
neck out
z-scores
global
normalization
L1 loss
L1 loss
MSE loss
VGG
network
top 10%
of activated
voxels
VGG
network
Encoder
conv3d
kernel=3
padding=1
stride=1
2ch
4ch
8ch
16ch
down block 1
down block 2
32ch
down block 3
final block
dropout
dim_0
dim_1
dim_2
dim_3
depth
depth
x 2
depth
x 4
depth
x 8
32ch
dim_3
depth
x 8
batch×T,ch,W,H,D
group
norm
leaky
relu
conv3d
3x3x3
stride 1
padding 1
leaky
relu
conv3d
3x3x3
stride 1
padding 1
n_ch
n_ch
group_norm0
relu0
conv0
group_norm1
relu1
conv2
dropout
grp:
nch/4
group
norm
grp:
nch/4
conv3d
kernel 3
padding 1
n_ch
2 × n_ch
stride 2
group
norm
leaky
relu
conv3d
3x3x3
stride 1
padding 1
32ch
2ch
depth
x 8
depth
/ 2
grp:
8
BottleNeck in
flatten
reshape
or
fullyconnected
2640
dim_3
real age
BCE loss
linear
sigmoid
gender
classification
(binary classification)
healthy /
schizofrenia
BCE loss
linear
sigmdoi
pathological
classification
(binary classification)
linear
leaky relu
real age
age
prediction
(regression)
L1 loss
group
norm
leaky
relu
conv3d
3x3x3
stride 1
padding 1
leaky
relu
conv3d
3x3x3
stride 1
padding 1
n_ch
n_ch
group_norm0
relu0
conv0
group_norm1
relu1
conv2
dropout
grp:
nch/4
group
norm
grp:
nch/4
unflatten
group
norm
grp:
2
2ch
depth
/ 2
dim_3
leaky
relu
conv3d
3x3x3
stride 1
padding 1
32ch
dim_3
depth
x 8
2640
BottleNeck out
upgreen0
upgreen1
upgreen2
1ch
conv3d
1x1x1
stride 1
padding 1
output_block
Decoder
class Decoder(BaseModel):
32ch
dim_3
depth
x 8
16ch
dim_2
depth
x 4
8ch
dim_1
depth
x 2
4ch
depth
4ch
depth
dim_0
dim_0
outChannels
dim_0
conv3d
kernel=3
padding=1
stride=1
conv3d
1x1x1
stride 1
padding 1
nearest
neighbour
upsample
torch.nn.Upsample
1
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame
New frame